0%

(CVPR 2015) Deep neural networks are easily fooled:High confidence predictions for unrecognizable images

Nguyen A, Yosinski J, Clune J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 427-436.



1. Overview


1.1. Motivation

A recent study revealed that changing an image in a way imperceptible to humans can cause a DNN to label the image as something else entirely.

In this paper

  • show that easy to produce images that are urecognizable to humans, but DNN 99.99% believe it is a recognizable obj
  • Evolution Algorithm
    directly (row1 ) and indirectly (row 2)


  • Gradient Ascent


1.2. Procedure



1.3. Dataset

  • MNIST
  • ImageNet

1.4. Network

  • AlexNet
  • LeNet
  • CaffeNet

1.5. Discussion

  • the are a discriminative model allocates to a class may be much larger than the area occupied by training examples for that class



  • Application. security camera (face, voice), search engine rankings (image’s background), driverless car (generate fooling images)



2. Evolution


2.1. Directly Encoding



  • unrecognizable to human.
    • uniform random initialize each pixel within [0, 255]
    • each pixel has 10% chosen to be mutated, rate decay every 1000 generation
    • polynomial mutation operator with a fixed mutation strength of 15 to mutate chosen pixel

2.2. Indirectly Encoding



  • recognizable to human.
    • producing image contain compressible patterns (symmetry and repetition)
    • based on Compositional Pattern-Producing Network (CPPN). take pixel as input, and output a new pixel


2.3. Gradient Ascent

  • maximize the softmax output for classes via gradient ascent to find image
  • employ L2-regularization to produce images with some recognizable features of classes (dog face, fox ears)



3. Experiments


3.1. Directly Encoding on ImageNet

Less successful at producing high-confidence images on large dataset compare with MNIST. (larger dataset→ less overfit→ more difficult to fool)



3.2. Indirectly Encoding on ImageNet

  • More successful.



  • Similar images for closely related categories.




3.3. Generalization

3.3.1. Same Architecture & Different Initialization



  • many images fooling A also fool B
  • still some images different

3.3.2. Different Architecture

  • many images generalize across DNN architecture

3.4. Train Network to Recognize Fooling Images

  • (MNIST) similar performance to without fooling images
  • (ImageNet) on the contrary